perf(actions): lazy initialization of embedding indexes #1572

Pouyanpi · 2026-01-09T14:34:21Z

Description

Implements lazy initialization of embedding indexes (user_message_index, bot_message_index, flows_index, instruction_flows_index) so FastEmbed is only loaded when semantic search is actually needed.

Previously, embedding indexes were eagerly initialized at LLMRails construction time, causing FastEmbed models to be downloaded even for simple configurations that only use input/output rails or passthrough mode.

Behavior by Configuration

Configuration	Before	After
Input rails only (e.g., `self check input`)	Loads FastEmbed at init	No FastEmbed loaded
Output rails only (e.g., `self check output`)	Loads FastEmbed at init	No FastEmbed loaded
Input + Output rails	Loads FastEmbed at init	No FastEmbed loaded
Passthrough mode	Loads FastEmbed at init	No FastEmbed loaded
Dialog rails with user messages	Loads FastEmbed at init	Loads FastEmbed on first `generate()`
RAG with knowledge base	Loads FastEmbed at init	Unchanged (RAG has separate init)

Implementation

Removed eager init() call from LLMGenerationActions.__init__()
Added _ensure_*_index() helper methods for lazy initialization
Actions call ensure methods before using indexes
Same pattern applied to V2.x in LLMGenerationActionsV2dotx

Test Plan

Added comprehensive tests for V1.0 (tests/test_actions_llm_embedding_lazy_init.py)
Added comprehensive tests for V2.x (tests/v2_x/test_llm_embedding_lazy_init.py)
Tests verify indexes are None at initialization
Tests verify FastEmbed cache stays empty for simple configs
Tests verify FastEmbed IS downloaded when dialog rails are used

greptile-apps

Greptile Overview

Greptile Summary

This PR successfully implements lazy initialization of embedding indexes to improve startup performance and reduce unnecessary FastEmbed downloads.

Key Changes

V1.0 (nemoguardrails/actions/llm/generation.py)

Removed eager init() call from __init__ that was running in a separate thread
Added _init_lock (asyncio.Lock) for thread-safe lazy initialization
Implemented three _ensure_*_index() methods using double-checked locking pattern
Updated all action methods (generate_user_intent, generate_next_step, generate_bot_message, generate_value, generate_intent_steps_message) to call ensure methods before index usage
Removed unnecessary imports: threading, check_sync_call_from_async_loop, get_or_create_event_loop
Added early return condition in _init_bot_message_index() when no user_messages exist

V2.x (nemoguardrails/actions/v2_x/generation.py)

Implemented _ensure_flows_index() and _ensure_instruction_flows_index() methods
Uses hasattr checks for instruction_flows_index since it's created dynamically
Updated all methods that use flow indexes to call ensure methods first

Behavior Changes

Input/output rails only: FastEmbed no longer loaded at initialization
Passthrough mode: No embedding index initialization
Dialog rails with user messages: FastEmbed loaded on first generate() call
All existing functionality preserved, just deferred until actually needed

Test Coverage

Comprehensive tests verify indexes are None after initialization
Tests verify FastEmbed cache stays empty for simple configurations
Tests verify proper lazy loading when dialog rails are used
Concurrent initialization tests ensure thread-safety of double-checked locking

Confidence Score: 5/5

Safe to merge - well-implemented lazy initialization with proper thread-safety and comprehensive test coverage.
The implementation uses the standard double-checked locking pattern with asyncio.Lock correctly. All action methods that access embedding indexes have been updated with appropriate ensure*_index() calls. The code properly handles edge cases where indexes may remain None after initialization (e.g., empty message lists). Comprehensive test coverage validates both the lazy behavior and thread-safety. No breaking changes to existing functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/actions/llm/generation.py	5/5	Implemented lazy initialization for embedding indexes with proper double-checked locking pattern. Removed eager init() call and unnecessary imports (threading, patch_asyncio). All action methods properly call ensure methods before index usage.
nemoguardrails/actions/v2_x/generation.py	5/5	Added lazy initialization for V2.x with _ensure_flows_index() and _ensure_instruction_flows_index() methods. Properly handles instruction_flows_index attribute that may not exist initially using hasattr checks.
tests/test_actions_llm_embedding_lazy_init.py	5/5	Comprehensive test coverage for V1.0 lazy initialization including verification that indexes are None at init, FastEmbed cache remains empty for simple configs, and indexes are initialized on first use.
tests/v2_x/test_llm_embedding_lazy_init.py	5/5	Test coverage for V2.x lazy initialization including passthrough mode and dialog configurations. Verifies indexes and instruction_flows_index remain uninitialized for simple configs.

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant LLMGenerationActions
    participant EmbeddingIndex

    Note over LLMRails,LLMGenerationActions: Before PR: Eager Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions->>LLMGenerationActions: init() in thread
    LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
    EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded
    LLMGenerationActions->>EmbeddingIndex: _init_bot_message_index()
    LLMGenerationActions->>EmbeddingIndex: _init_flows_index()
    LLMGenerationActions-->>LLMRails: Ready (all indexes loaded)

    Note over LLMRails,LLMGenerationActions: After PR: Lazy Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    Note over LLMGenerationActions: No init() call!
    LLMGenerationActions-->>LLMRails: Ready (no indexes loaded)
    
    User->>LLMRails: generate(messages)
    LLMRails->>LLMGenerationActions: generate_user_intent()
    LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
    alt user_message_index is None and user_messages exist
        LLMGenerationActions->>LLMGenerationActions: acquire _init_lock
        LLMGenerationActions->>LLMGenerationActions: check again if None
        LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
        EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded (lazy)
    end
    LLMGenerationActions->>EmbeddingIndex: search(text)
    EmbeddingIndex-->>LLMGenerationActions: results
    LLMGenerationActions-->>LLMRails: intent

    Note over User,EmbeddingIndex: Input/Output Rails Only: No Indexes Loaded
    User->>LLMRails: __init__(config with input rails)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions-->>LLMRails: Ready (no indexes)
    User->>LLMRails: generate(messages)
    Note over LLMGenerationActions: No embedding indexes needed
    LLMRails-->>User: response (FastEmbed never loaded)

nemoguardrails/actions/v2_x/generation.py

github-actions · 2026-01-09T15:31:44Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1572

codecov · 2026-01-09T16:41:47Z

Codecov Report

❌ Patch coverage is 82.22222% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/actions/v2_x/generation.py	61.11%	7 Missing ⚠️
nemoguardrails/actions/llm/generation.py	96.29%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Pouyanpi · 2026-01-09T16:45:11Z

@greptileai re-review

greptile-apps

Greptile Overview

Greptile Summary

This PR successfully implements lazy initialization of embedding indexes to improve startup performance and reduce unnecessary FastEmbed downloads.

Key Changes

V1.0 (nemoguardrails/actions/llm/generation.py)

Removed eager init() call from __init__ that was running in a separate thread
Added _init_lock (asyncio.Lock) for thread-safe lazy initialization
Implemented three _ensure_*_index() methods using double-checked locking pattern
Updated all action methods (generate_user_intent, generate_next_step, generate_bot_message, generate_value, generate_intent_steps_message) to call ensure methods before index usage
Removed unnecessary imports: threading, check_sync_call_from_async_loop, get_or_create_event_loop
Added early return condition in _init_bot_message_index() when no user_messages exist

V2.x (nemoguardrails/actions/v2_x/generation.py)

Implemented _ensure_flows_index() and _ensure_instruction_flows_index() methods
Uses hasattr checks for instruction_flows_index since it's created dynamically
Updated all methods that use flow indexes to call ensure methods first

Behavior Changes

Input/output rails only: FastEmbed no longer loaded at initialization
Passthrough mode: No embedding index initialization
Dialog rails with user messages: FastEmbed loaded on first generate() call
All existing functionality preserved, just deferred until actually needed

Test Coverage

Comprehensive tests verify indexes are None after initialization
Tests verify FastEmbed cache stays empty for simple configurations
Tests verify proper lazy loading when dialog rails are used
Concurrent initialization tests ensure thread-safety of double-checked locking

Confidence Score: 5/5

Safe to merge - well-implemented lazy initialization with proper thread-safety and comprehensive test coverage.
The implementation uses the standard double-checked locking pattern with asyncio.Lock correctly. All action methods that access embedding indexes have been updated with appropriate ensure*_index() calls. The code properly handles edge cases where indexes may remain None after initialization (e.g., empty message lists). Comprehensive test coverage validates both the lazy behavior and thread-safety. No breaking changes to existing functionality.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/actions/llm/generation.py	5/5	Implemented lazy initialization for embedding indexes with proper double-checked locking pattern. Removed eager init() call and unnecessary imports (threading, patch_asyncio). All action methods properly call ensure methods before index usage.
nemoguardrails/actions/v2_x/generation.py	5/5	Added lazy initialization for V2.x with _ensure_flows_index() and _ensure_instruction_flows_index() methods. Properly handles instruction_flows_index attribute that may not exist initially using hasattr checks.
tests/test_actions_llm_embedding_lazy_init.py	5/5	Comprehensive test coverage for V1.0 lazy initialization including verification that indexes are None at init, FastEmbed cache remains empty for simple configs, and indexes are initialized on first use.
tests/v2_x/test_llm_embedding_lazy_init.py	5/5	Test coverage for V2.x lazy initialization including passthrough mode and dialog configurations. Verifies indexes and instruction_flows_index remain uninitialized for simple configs.

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant LLMGenerationActions
    participant EmbeddingIndex

    Note over LLMRails,LLMGenerationActions: Before PR: Eager Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions->>LLMGenerationActions: init() in thread
    LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
    EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded
    LLMGenerationActions->>EmbeddingIndex: _init_bot_message_index()
    LLMGenerationActions->>EmbeddingIndex: _init_flows_index()
    LLMGenerationActions-->>LLMRails: Ready (all indexes loaded)

    Note over LLMRails,LLMGenerationActions: After PR: Lazy Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    Note over LLMGenerationActions: No init() call!
    LLMGenerationActions-->>LLMRails: Ready (no indexes loaded)
    
    User->>LLMRails: generate(messages)
    LLMRails->>LLMGenerationActions: generate_user_intent()
    LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
    alt user_message_index is None and user_messages exist
        LLMGenerationActions->>LLMGenerationActions: acquire _init_lock
        LLMGenerationActions->>LLMGenerationActions: check again if None
        LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
        EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded (lazy)
    end
    LLMGenerationActions->>EmbeddingIndex: search(text)
    EmbeddingIndex-->>LLMGenerationActions: results
    LLMGenerationActions-->>LLMRails: intent

    Note over User,EmbeddingIndex: Input/Output Rails Only: No Indexes Loaded
    User->>LLMRails: __init__(config with input rails)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions-->>LLMRails: Ready (no indexes)
    User->>LLMRails: generate(messages)
    Note over LLMGenerationActions: No embedding indexes needed
    LLMRails-->>User: response (FastEmbed never loaded)

Uses asyncio.Lock with double-checked locking pattern to prevent race conditions when multiple async tasks call _ensure_*_index() methods concurrently. Also fixes tests to use TestChat (with FakeLLM) instead of creating LLMRails directly, and adds skip conditions for fastembed tests.

fix

trebedea

Looks good, but we should also remove the init() function.

nemoguardrails/actions/llm/generation.py

perf(actions): lazy initialization of embedding indexes

9f7f932

greptile-apps bot reviewed Jan 9, 2026

View reviewed changes

nemoguardrails/actions/v2_x/generation.py Outdated Show resolved Hide resolved

nemoguardrails/actions/v2_x/generation.py Outdated Show resolved Hide resolved

Pouyanpi marked this pull request as draft January 9, 2026 14:39

Pouyanpi marked this pull request as ready for review January 9, 2026 15:30

Pouyanpi force-pushed the perf/lazy-init-indices branch 3 times, most recently from aa30386 to 46e4df9 Compare January 9, 2026 16:34

greptile-apps bot reviewed Jan 9, 2026

View reviewed changes

Pouyanpi added 2 commits January 12, 2026 16:10

simplify fastembed index initialization test

ce6cd1e

fix

Pouyanpi force-pushed the perf/lazy-init-indices branch from 46e4df9 to ce6cd1e Compare January 12, 2026 15:10

trebedea approved these changes Jan 14, 2026

View reviewed changes

nemoguardrails/actions/llm/generation.py Outdated Show resolved Hide resolved

nemoguardrails/actions/llm/generation.py Show resolved Hide resolved

remove init from generation.py

2a0f3a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(actions): lazy initialization of embedding indexes #1572

perf(actions): lazy initialization of embedding indexes #1572

Uh oh!

Pouyanpi commented Jan 9, 2026

Uh oh!

greptile-apps bot left a comment •

edited by Pouyanpi

Loading

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

codecov bot commented Jan 9, 2026

Uh oh!

Pouyanpi commented Jan 9, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

trebedea left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(actions): lazy initialization of embedding indexes #1572

Are you sure you want to change the base?

perf(actions): lazy initialization of embedding indexes #1572

Uh oh!

Conversation

Pouyanpi commented Jan 9, 2026

Description

Behavior by Configuration

Implementation

Test Plan

Uh oh!

greptile-apps bot left a comment • edited by Pouyanpi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Key Changes

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 9, 2026

Documentation preview

Uh oh!

codecov bot commented Jan 9, 2026

Codecov Report

Uh oh!

Pouyanpi commented Jan 9, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Key Changes

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

trebedea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot left a comment •

edited by Pouyanpi

Loading